O objeto principal da análise são as buscas e a navegação depois da busca. Criamos esses dados a partir dos dados originais da wikimedia em /data/search_data.csv.
Aqui, exploramos esses dados.
library(tidyverse)
package 㤼㸱tidyverse㤼㸲 was built under R version 3.4.4[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[30m[32mv[30m [34mggplot2[30m 2.2.1 [32mv[30m [34mpurrr [30m 0.2.4
[32mv[30m [34mtibble [30m 1.4.2 [32mv[30m [34mdplyr [30m 0.7.4
[32mv[30m [34mtidyr [30m 0.7.2 [32mv[30m [34mstringr[30m 1.2.0
[32mv[30m [34mreadr [30m 1.1.1 [32mv[30m [34mforcats[30m 0.2.0[39m
package 㤼㸱ggplot2㤼㸲 was built under R version 3.4.4package 㤼㸱tibble㤼㸲 was built under R version 3.4.4[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(here)
package 㤼㸱here㤼㸲 was built under R version 3.4.4here() starts at C:/Users/marcosasn/Documents/gitlocal/lab2-cp4-marcosasn
library(lubridate)
Attaching package: 㤼㸱lubridate㤼㸲
The following object is masked from 㤼㸱package:here㤼㸲:
here
The following object is masked from 㤼㸱package:base㤼㸲:
date
library(shiny)
library(plotly)
Attaching package: 㤼㸱plotly㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
last_plot
The following object is masked from 㤼㸱package:stats㤼㸲:
filter
The following object is masked from 㤼㸱package:graphics㤼㸲:
layout
theme_set(theme_bw())
buscas = read_csv(here::here("data/search_data.csv")) %>%
head(100000)
Parsed with column specification:
cols(
session_id = col_character(),
search_index = col_integer(),
session_start_timestamp = col_double(),
session_start_date = col_datetime(format = ""),
group = col_character(),
results = col_integer(),
num_clicks = col_integer(),
first_click = col_integer()
)
buscas %>%
ggplot(aes(x = results)) +
geom_histogram(binwidth = 5)
#search_index quantidade de buscas na sessão
#What is our daily overall clickthrough rate? How does it vary between the groups?
#num_clicks
#group
#session_start_date
plot = buscas %>%
group_by(group) %>%
summarise(n = n()) %>%
ggplot(aes(x = group, y = n)) +
geom_col(
aes(text = paste("Grupo:", group,
"<br>",
"Frequência:", n)),
fill = "white", color = "blue") +
ggtitle("Distribuição da frequência do grupo da sessão") +
xlab("Grupo") +
ylab("Frequência") +
theme(plot.title = element_text(hjust = 0.5), legend.position="none")
Ignoring unknown aesthetics: text
div(ggplotly(plot, tooltip = "text", width = 700, height = 400), align = "center")
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
plot = buscas %>%
ggplot(aes(x = group, y = num_clicks)) +
geom_jitter(aes(text = paste("Grupo:",group,
"<br>Quantidade índices da busca:",search_index)),
alpha = .4, width = .2, size = .8, color = "blue") +
ggtitle("Distribuição da quantidade de índices da busca") +
xlab("Grupo") +
ylab("Quantidade índices da busca") +
theme(plot.title = element_text(hjust = 0.5), legend.position="none")
Ignoring unknown aesthetics: text
div(ggplotly(plot, tooltip = "text", width = 700, height = 400), align = "center")
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
plot = buscas %>%
ggplot(aes(x = group, y = num_clicks)) +
geom_jitter(aes(text = paste("Grupo:",group,
"<br>Quantidade índices da busca:",search_index)),
alpha = .4, width = .2, size = .8, color = "blue") +
scale_y_log10() +
ggtitle("Distribuição da quantidade de índices da busca") +
xlab("Grupo") +
ylab("Quantidade índices da busca") +
theme(plot.title = element_text(hjust = 0.5), legend.position="none")
Ignoring unknown aesthetics: text
div(ggplotly(plot, tooltip = "text", width = 700, height = 400), align = "center")
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`
Transformation introduced infinite values in continuous y-axis
plot = buscas %>%
ggplot(aes(x= num_clicks)) +
geom_histogram(binwidth = 5, fill = "white", color = "blue") +
facet_grid(group ~ .) +
ggtitle("Distribuição da frequência da quantidade de índices da busca") +
xlab("Quantidade índices da busca") +
ylab("Frequência") +
theme(plot.title = element_text(hjust = 0.5), legend.position="none")
div(ggplotly(plot, tooltip = "text", width = 700, height = 400), align = "center")
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`